Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes

نویسندگان

  • Hui Peng
  • Chaowang Lan
  • Yuansheng Liu
  • Tao Liu
  • Michael Blumenstein
  • Jinyan Li
چکیده

Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Direct Bisulfite Sequencing and Methylation Specific PCR to Detect Methylation of p15INK4b and F7 genes in Coronary Artery Disease Patients

Genome-Wide Association Studies (GWAS) have identified genetic variants contributing to the risk of cardiovascular disease (CVD) at the chromosome 9p21 locus. The chromosome 9p21 is an important susceptibility locus for several multifactorial diseases like ischemic stroke, aortic aneurysm, type 2 diabetes mellitus and coronary artery disease (CAD). F7 gene because of its role in activating the ...

متن کامل

Long non-coding RNAs and their significance in human diseases

Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...

متن کامل

Effect of Memantine on Expression of NAT-Rad18, Rad18 and Sorl1 Genes in Rat Model of Alzheimer\'s Disease

Background and Objective: Dysregulation of long-term expression of non-coding RNAs (lncRNAs) has a potential role in progressive brain disorders such as Alzheimer's disease. This study aimed to analyze the apoptosis and expression of 51A and NAT-Rad18 lncRNAs and their target genes in brain tissue and peripheral blood mononuclear cells (PBMCs) of the rat model of AD, before and after memantine ...

متن کامل

Comparing MicroRNA Target Gene Predictions Related to Alzheimer's Disease Using Online Bioinformatics Tools

Introduction: The prediction of microRNAs related to target genes using bioinformatics tools saves time and costs of the experimental analyses. In the present study, the prediction of microRNA target genes relevant to Alzheimer’s Diseases (AD) were compared with the experimentally reported data using different bioinformatics tools. Method: A total of 41 microRNAs associated with 21 essential ge...

متن کامل

Comparing MicroRNA Target Gene Predictions Related to Alzheimer's Disease Using Online Bioinformatics Tools

Introduction: The prediction of microRNAs related to target genes using bioinformatics tools saves time and costs of the experimental analyses. In the present study, the prediction of microRNA target genes relevant to Alzheimer’s Diseases (AD) were compared with the experimentally reported data using different bioinformatics tools. Method: A total of 41 microRNAs associated with 21 essential ge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2017